Case Study =========== This case study demonstrates the application of the TRIAGE analysis pipeline using a publicly available single-nucleus RNA sequencing (snRNA-seq) dataset. **Data Source:** - Dataset: `Kuppe et al., Nature 2022, Spatial multi-omic map of human myocardial infarction `_. - Data Available at: `https://www.nature.com/articles/s41586-022-05060-x#data-availability `_ - The demonstration uses control endothelial and pericyte cells to showcase the TRIAGE analysis pipeline. Objective: To apply `TRIAGEcluster` for cell clustering, `byPeak()` for calculating average expression data by various categories, `TRIAGEgene` for generating TRIAGE-weighted expression data, `plotJaccard()` for Jaccard Index Heatmap visualization, `TRIAGEparser` for dissecting gene clusters and their associated ontologies, `plotGO()` for GO enrichment visualization, and `getClusterGenes()` for the extraction of genes from TRIAGEparser clusters. **Steps:** 1. `TRIAGEcluster` + `byPeak()` 2. `TRIAGEgene` + `plotJaccard()` 3. `TRIAGEparser` + `plotGO()` 4. `getClusterGenes()` .. code-block:: R # Load packages and set up the working directory library(TRIAGE) library(reticulate) setwd("/path/to/working/directory") **Step 1:** Run `TRIAGEcluster` for cell clustering, select a suitable bandwidth resolution based on UMAP reviews, and calculate average gene expression by various categories using the `byPeak()` function. .. code-block:: R # Run TRIAGEcluster expr_file <- system.file("extdata", "TRIAGEcluster_demo_expr_human.csv", package = "TRIAGE") metadata_file <- system.file("extdata", "TRIAGEcluster_demo_metadata_human.csv", package = "TRIAGE") TRIAGEcluster(expr_file, metadata_file, outdir = "tests", output_prefix = "demo") # Select a suitable bandwidth and calculate average gene expression peak_file <- "tests/demo_bw0.80_metadata.csv" avg_peak <- byPeak(expr_file, peak_file) # Save the average gene expression result to a CSV file write.csv(avg_peak, file = "tests/AverageByPeak.csv", row.names = TRUE, quote = FALSE) TRIAGEcluster generates nine UMAPs with bandwidths from 0.1 to 0.9, and a corresponding set of metadata files. Here, a bandwidth of 0.8 was selected for downstream application, resulting in four TRIAGEcluster-defined peaks within the UMAP. .. image:: image/demo_bw0.80_labelledUMAP.jpg :alt: UMAP showing TRIAGEcluster-defined peaks A snippet of the "AverageByPeak.csv" file is shown below: +---------+------------+------------+------------+------------+ | | Peak0 | Peak1 | Peak2 | Peak3 | +=========+============+============+============+============+ | ALDH1A2 | 0.04404541 | 0.04426352 | 1.18265890 | 0.12415153 | +---------+------------+------------+------------+------------+ | PRRX1 | 0.63263328 | 0.19289223 | 0.08957362 | 0.11108879 | +---------+------------+------------+------------+------------+ | NOVA2 | 0.00785169 | 0.31212372 | 0.22311596 | 0.28179011 | +---------+------------+------------+------------+------------+ | NGFR | 0.07502976 | 0.01140611 | 0.01078702 | 0.02280367 | +---------+------------+------------+------------+------------+ | MCF2L | 0.03431053 | 1.41983845 | 0.25927713 | 1.39706381 | +---------+------------+------------+------------+------------+ | TBX4 | 0 | 0 | 0 | 0 | +---------+------------+------------+------------+------------+ | ... | ... | ... | ... | ... | +---------+------------+------------+------------+------------+ **Step 2:** Run `TRIAGEgene` to generate TRIAGE-weighted expression data and `plotJaccard()` to generate Jaccard similarity index heatmap. .. code-block:: R # Run TRIAGEgene to generate TRIAGE-weighted expression data (DS) ds <- TRIAGEgene(avg_peak) # Save the average DS result to a CSV file write.csv(ds, file = "tests/AverageByPeak_DS.csv", row.names = TRUE, quote = FALSE) # Save the average DS result to a tab-delimited text file write.table(ds, file = "tests/AverageByPeak.txt", sep = "\t", row.names = TRUE, col.names = NA, quote = FALSE) # Generate Jaccard Index Heatmap plotJaccard(ds, "tests/Jaccard_heatmap_peak.pdf") A snippet of the "AverageByPeak_DS.csv" file is shown below: +---------+----------------+----------------+----------------+----------------+ | | Peak0 | Peak1 | Peak2 | Peak3 | +=========+================+================+================+================+ | ALDH1A2 | 0.000685619 | 0.000689014 | 0.018409486 | 0.001932566 | +---------+----------------+----------------+----------------+----------------+ | PRRX1 | 0.019265802 | 0.005874214 | 0.002727817 | 0.003383026 | +---------+----------------+----------------+----------------+----------------+ | NOVA2 | 0.000207595 | 0.008252406 | 0.005899082 | 0.0074504 | +---------+----------------+----------------+----------------+----------------+ | NGFR | 0.012602077 | 0.001915782 | 0.001811799 | 0.003830128 | +---------+----------------+----------------+----------------+----------------+ | MCF2L | 0.000594645 | 0.024607587 | 0.004493599 | 0.024212873 | +---------+----------------+----------------+----------------+----------------+ | TBX4 | 0 | 0 | 0 | 0 | +---------+----------------+----------------+----------------+----------------+ | ... | ... | ... | ... | ... | +---------+----------------+----------------+----------------+----------------+ The plotJaccard() function generates a heatmap to visualize the extent of similarity across identified peaks: .. image:: image/Jaccard_heatmap_peak.jpg :alt: Jaccard similarity index heatmap **Step 3:** Run `TRIAGEparser` to dissect gene clusters and their associated ontologies and `plotGO()` to generate GO enrichment heatmaps. .. code-block:: R # Run TRIAGEparser to dissect gene clusters and their associated ontologies input_file <- "tests/AverageByPeak_DS.csv" TRIAGEparser(input_file, input_type = "table", outdir="tests/TRIAGEparser_peak") # Generate GO Heatmaps plotGO(indir="tests/TRIAGEparser_peak", outdir="tests/TRIAGEparser_peak") The TRIAGEparser output comprises gene clusters and their associated Gene Ontology (GO) enrichments, which are organized into two separate folders named “gene_cluster” and “go”. The plotGO() function generates heatmaps for gene clusters in each peak. The folder structure is illustrated below: .. image:: image/Screenshot_TRIAGEparser_output.jpg :alt: TRIAGEparser output folder structure GO enrichment heatmap for the 'Peak0' cluster: .. image:: image/Peak0_go_heatmap.jpg :alt: GO enrichment heatmap for 'Peak0' cluster GO enrichment heatmap for the 'Peak1' cluster: .. image:: image/Peak1_go_heatmap.jpg :alt: GO enrichment heatmap for 'Peak1' cluster GO enrichment heatmap for the 'Peak2' cluster: .. image:: image/Peak2_go_heatmap.jpg :alt: GO enrichment heatmap for 'Peak2' cluster GO enrichment heatmap for the 'Peak3' cluster: .. image:: image/Peak3_go_heatmap.jpg :alt: GO enrichment heatmap for 'Peak3' cluster **Step 4:** Since the Peak3 cluster4 is highly enriched in 'Cell differentiation' and 'Cell fate commitment' terms, we use `getClusterGenes()` to extract genes from this gene cluster for further downstream analysis. .. code-block:: R # Extract genes from Peak3 cluster4 Peak3_cluster4_genes <- getClusterGenes("tests/TRIAGEparser_peak/gene_cluster/Peak3_gene_clusters.csv", "cluster4") **Alternative Calculations 1: Average Gene Expression by Cluster:** Objective: To calculate average gene expression based on cluster categories using `byPeak()` function, followed by `TRIAGEgene` analysis and Jaccard index heatmap generation. .. code-block:: R expr_file <- system.file("extdata", "TRIAGEcluster_demo_expr_human.csv", package = "TRIAGE") peak_file <- "tests/demo_bw0.80_metadata.csv" # Calculate average gene expression by cluster avg_cluster <- byPeak(expr_file, peak_file, peak_col = "final_cluster") write.csv(avg_cluster, file = "tests/AverageByCluster.csv", row.names = TRUE, quote = FALSE) # Run TRIAGEgene ds_cluster <- TRIAGEgene(avg_cluster) write.csv(ds_cluster, file = "tests/AverageByCluster_DS.csv", row.names = TRUE, quote = FALSE) # Generate Jaccard index heatmap plotJaccard(ds_cluster, "tests/Jaccard_heatmap_cluster.pdf") # Run TRIAGEparser input_file <- "tests/AverageByCluster_DS.csv" TRIAGEparser(input_file, input_type = "table", outdir="tests/TRIAGEparser_cluster") # Generate GO Heatmaps plotGO(indir="tests/TRIAGEparser_cluster", outdir="tests/TRIAGEparser_cluster") **Alternative Calculations 2: Average Gene Expression by Cell Type** Objective: To calculate average gene expression based on cell type categories using `byPeak()` function, followed by `TRIAGEgene` analysis and Jaccard index heatmap generation. .. code-block:: R expr_file <- system.file("extdata", "TRIAGEcluster_demo_expr_human.csv", package = "TRIAGE") peak_file <- "tests/demo_bw0.80_metadata.csv" # Calculate average gene expression by cell type avg_celltype <- byPeak(expr_file, peak_file, peak_col = "cell_type") write.csv(avg_celltype, file = "tests/AverageByCelltype.csv", row.names = TRUE, quote = FALSE) # Run TRIAGEgene ds_celltype <- TRIAGEgene(avg_celltype) write.csv(ds_celltype, file = "tests/AverageByCelltype_DS.csv", row.names = TRUE, quote = FALSE) # Generate Jaccard index heatmap plotJaccard(ds_celltype, "tests/Jaccard_heatmap_celltype.pdf") # Run TRIAGEparser input_file <- "tests/AverageByCelltype_DS.csv" TRIAGEparser(input_file, input_type = "table", outdir="tests/TRIAGEparser_celltype") # Generate GO Heatmaps plotGO(indir="tests/TRIAGEparser_celltype", outdir="tests/TRIAGEparser_celltype")